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Abstract 

In recursive linear models, the multivariate normal joint distribution of all 
variables exhibits a dependence structure induced by a recursive (or acyclic) 
system of linear structural equations. These linear models have a long tradition 
and appear in seemingly unrelated regressions, structural equation modelling, 
and approaches to causal inference. They are also related to Gaussian graphical 
models via a classical representation known as a path diagram. Despite the 
models' long history, a number of problems remain open. In this paper, we ad- 
dress the problem of computing maximum likelihood estimates in the subclass 
of 'bow-free' recursive linear models. The term 'bow-free' refers to the condition 
that the errors for variables i and j be uncorrelated if variable i occurs in the 
structural equation for variable j. We introduce a new algorithm, termed Resid- 
ual Iterative Conditional Fitting (RICF), that can be implemented using only 
least squares computations. In contrast to existing algorithms, RICF has clear 
convergence properties and finds parameter estimates in closed form whenever 
possible. 

Key words: Linear regression; Maximum likelihood estimation; Path diagram; 
Linear structural equation model; Recursive semi-Markov model 



1 Introduction 



A system of linear structural equations determines a linear model for a set of variables 
by dictating that, up to a random error term, each variable is equal to a linear 
combination of some of the remaining variables. Traditionally the errors are assumed 
to have a centered joint multivariate normal distribution. Prese nting a forma lism for 
simu ltaneously representing causal and statistical hypotheses (IPearll. l2000l: Spirtes 



et al.. bOQCh . these normal linear models, w hich are often called structural equation 
models, are widely used in the social sciences (|Bollenl . ll989l ) and many other contexts. 
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In seminal work, IWrightl (|192ll . Il934l ) introduced path diagrams, which are very 
useful graphical representations of structural equations. A path diagram is a graph 
with one vertex for each variable and directed and/or bi-directed edges. A directed 
edge % — * j indicates that variable i appears as covariate in the equation for variable 
j. The directed edges are thus in correspondence with the path coefficients, that 
is, the coefficients appearing in the linear structural equations. A bi-directed edge 
i <->• j indicates correlation between the err ors in the equations for var iables % and j. 



Graphs of this kind are also considered by IShpitser and Pearll (|2006l ) , who refer to 



them as recursive semi-Markovian causal models. 
1.1 A Motivating Example 

We will motivate the considered normal linear models with the following e xample 



which is adapted from a more complex longitudinal study considered in iRobins 



4200a). 

Consider a two-phase sequential intervention study examining the effect of ex- 
ercise and diet on diabetes. In the first phase patients are randomly assigned to a 
number of hours of exercise per week (Ex) drawn from a log-normal distribution. 
At the end of this phase blood pressure (BP) levels are measured. In the second 
phase patients are randomly assigned to a strict calorie controlled diet that produces 
a change in body-mass index (ABMI). The assigned change in BMI, though still 
randomized, is drawn, by design, from a normal distribution with mean depending 
linearly on log (Ex) and BP. The dependence here is due to practical and ethical con- 
siderations. Finally at the end of the second phase, triglyceride levels (Y) indicating 
diabetic status are measured. 

A question of interest is whether or not there is an effect of Ex on the outcome Y 
that is not mediated through the dependence of ABMI on Ex and BP. In other words, 
if there had been no ethical or practical restrictions, and the assignment (ABMI) in 
the second phase was completely randomized and thus independent of BP and Ex, 
would there still be any dependence between Ex and Y1 Note that due to underlying 
confounding factors such as life history and genetic background, we would expect to 
observe dependence between BP and Y even if the null hypothesis of no effect of Ex 
on Y was true and the second treatment (ABMI) was completely randomized. 

Our model consists of two pieces. First, the design of the study dictates that 

log(Ex) = a + £ Ex , (1.1) 
ABMI = 70 + 71 log(Ex) + 72 BP + e ABMI , (1.2) 

where e Ex ~ M(f),a\^) and £abmi ~ -^(0;°Abmi) are independent. This assignment 
model is complemented by a model describing how BP and Y respond to the prior 
treatments: 

BP = /3 + /3 1 log(Ex) + e BP , (1.3) 
Y = <5 + <5ilog(Ex) + <5 2 ABMI + £ y , (1.4) 
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Figure 1: Path diagram illustrating a two-phase trial with two treatments (Ex and 
ABMI) and two responses (BP and Y). Ex is randomly assigned, ABMI is random- 
ized conditional on BP and Ex. The bi-directed edge indicates possible dependence 
due to unmeasured factors (genetic or environmental). 

where (e B p, £y)* are centered bivariate normal and independent of e Ex and £Abmi- We 
denote the variances of e B p and e Y by a PP and a Y , respectively, and write a B p, Y for 
the possibly non-zero covariance of e B p and e Y . Figure [1] shows the path diagram for 
this structural equation model. 

Equations (|1.2|) and (|1.3|) simply specify conditional expectations that can 

be estimated in regressions. However, this is not the case in general with (jl.4p . 
Instead, 

E [Y | log(Ex), ABMI] = S + Si log(Ex) + 5 2 ABMI 

with 

di = di 2 2 2 , (1.5) 

h = h + 2 2 2 • (1.6) 

72 "bp "+~ °"abmi 

We see that 5i and 52 would have an interpretation as regression coefficients if: (i) the 
assignment of ABMI did not depend on BP (i.e., 72 = 0) and thus both treatments 
were completely randomized, or (ii) there were no dependence between e Y and e BP 
(i.e., cr BP ,y = 0). Similarly, in E[Y \ log (Ex), BP, ABMI], the coefficient of ABMI is 
equal to 82 but the coefficient for log (Ex) is S\ — f3\o BPX I ,(J bp- 

In this paper we consider likelihood-based methods for fitting a large class of 
structural equation models that includes the one given by (|l.lj) - (jl.4|) and can be used 
for consisten t estima t ion of par ameters such as Si . For alternative semi-parametric 
methods, see lRobinsI (|l999h and lGill and Robins! (|200lh . 



1.2 Challenges in Structural Equation Modelling 

A number of mathematical and statistical problems arise in the normal linear models 
associated with systems of structural equations: 

(i) Different path diagrams may induce the same statistical model, i.e., family of 
multivariate normal distributions. Such model equivalence occurs, for example, 
for the two path diagrams 1 — > 2 and 1 <— 2, which differ substantively by the 
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direction of the cause-effect relationship. The two associated statistical models, 
however, are identical, both allowing for correlation between the two variables. 



(ii) In many important special cases the path coefficient associated with a directed 
edge % —* j has a population interpretation as a regression coefficient in a 
regression of j on a set of variables including i. However, as seen already in 
this interpretation is not valid in general. 
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The parameters of the model may not be identifiable, so two different sets of 
parameter values may lea d to th e same population distribution; for an early 
review of this problem see iFisherl (|1966l ) . 



(iv) The set of parameterized covariance matrices may contain 'singularities' at 
which it cannot be approximated locally by a linear space. At 'singular' points, 
X 2 and normal approximation to the distribution of likelihood ratio tes ts and 



maxim um likelihood estimators (MLE) may not be valid; see for instance iDrton 
(|2008h . 



(v) Iterative procedures are typically required for maxim ization of the likelihood 
funct ion, which for some models can be multimodal (jDrton and Richardson . 



2004). Such multimodality typically occurs in small samples or under model 



misspecification. 

The problems listed may arise in models without unobserved variables and be- 
come only more acute in latent variable models. They are challenging in full gener- 
ality, but significant progress has been made in special cases such as recursive linear 
models with uncorrelated errors, which are also known as directed a cyclic graph 
(DAG) models or 'Bayesian' networks (jLauritzenl . Il996l ; iPearll . Il988l ). A normal 



DAG model is equivalent to a series of linear regressions, is always identified and has 
standard asymptotics. Under simple sample size conditions, the MLE exists almost 
surely and is a rational function of the data. Graphical modelling theory also solves 
the equivale nce problem (i) by chara cterizing all DAGs that induce the same statis- 
tical model (lAndersson et all 19971 ) . For more recent progre ss on the equivalence 
problem see lAli et al.l (|2005f ) and IZhang and Spirtesl (|200a ) . 



1.3 New Contribution 

The requirement of uncorrelated errors may be overly restrictive in many settings. 
While arbitrary correlation patterns over the errors may yield rather ill-behaved 
statistical models, there are subclasses of models with correlate d errors in which s ome 
of the nice properties of DAG models are preserved; compare iMcDonaldl ([2002]). In 
this paper we consider path diagrams in which there are no directed cycles and no 
'double' edges of the form i ±3 j (compare Def. 12.31 and f2.5|) . Since such double edges 
have been called 'bows', we call this class bow-free acyclic path diagrams (BAPs). 
An example of a BAP arose in our motivating example in ^1,1) see Figure [TJ While 
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instrumental variable models, which are much studied in economics, contain bows, 
most model s in ot h er so cial sciences are based on BAPs. For instance, all path 
diagrams in iBollenl ()1989l ) are BAPs. 



Bow-free acyclic path diagrams were also considered bv lBrito and Pearll (|2002l ) 
who showed that the associated normal linear models are almost everywhere iden- 
tifiable; see §2.21 for the definition. This result and other identification properties 
of BAP models are reviewed in Section [2j In Section [3] we give details on likeli- 
hood equations and Fisher-information of normal structural equation models. This 
sets the scene for our main contribution: the Residual Iterative Conditional Fitting 
(RICF) algorithm for maximization of the likelihood function of BAP models, which 
is presented in Section [H Standard software for structural equation modelling; cur - 
rently employs general-purpose optimization routines for this task (|Bollenl . Il989l ). 
Many of these algorithms, however, neglect constraints of positive definite ness on 
the co variance matrix and suffer from convergence problems. According to ISteiger 
(|200ll ). failure to converge is 'not uncommon' and presents significant challenges to 
novice users of existing software. In contrast, our RICF algorithm produces positive 
definite covariance matrix estimates during all its iterations and has very good con- 
vergence properties, as illustrated in the simulations in Section [5l Further discussion 
of RICF is provided in Section [6l 



2 Normal Linear Models and Path Diagrams 

Let Y = (Yi | i G V) G M v be a random vector, indexed by the finite set V, that 
follows a multivariate normal distribution Af(0, S) with positive definite covariance 
matrix E. A zero mean vector is assumed merely to avoid notational overhead. 
The models we consider subsequently are induced by linear structural equations as 
follows. 



2.1 Systems of Structural Equations and Path Diagrams 

Let {pa(i) | i G V} and {sp(i) | i G V} be two families of index sets satisfying 
i S" pa(z) U sp(i) for all i G V. Moreover, let the family {sp(i) | i G V} satisfy 
the symmetry condition that j G sp(i) if and only if i G sp(j). These two families 
determine a system of structural equations 

^=E i6pa (i)/%^ + ^ ieF, (2.1) 

whose zero- mean errors £j and Ej are uncorrelated if i £ sp(j), or equivalently, 
j sp(i). The equations in (|2.ip correspond to a path diagram, that is, a mixed 
graph G featuring both directed (— >) and bi-directed (<->) edges but no edges from a 
vertex i to itself (see Figures [1] and [2|) . The vertex set of G is the index set V, and 
G contains the edge j — > % if and only if j G pa(i), and the edge j <-* i if and only if 
j G sp(i) (or equivalently, i G sp(j)). Subsequently, we will exploit the path diagram 
representation of (|2.ip . If i — * j is an edge in G, then we call i a parent of j, and if 
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(a) 




(b) 




(c) 




Figure 2: Path diagrams that are (a) cyclic, (b) acyclic but not bow-free, (c) acyclic 
and bow-free. Only path diagram (c) yields a curved exponential family. 



i <-> j is in G then i is referred to as a spouse of j. Thus pa(i), sp(i) are, respectively, 
the sets of parents and spouses of i. 

Let G be a path diagram and define B(G) to be the collection of all V X V 
matrices B = (/%) that satisfy 



J 



i not in G 



Pa = o, 



(2.2) 



and are such that I — B is invertible. Let P(V) be the cone of positive definite V xV 
matrices and O(G) C P(V) the set of matrices = (wy) € P(V) that satisfy 



i 7^ j and j *-* i not in G 



UJij 



0. 



(2.3) 



(Here and in the sequel, a symbol such as V denotes both a finite set and its cardinal- 
ity.) The system (|2.ip associated with the path diagram G can be written compactly 
as Y = BY + e. If we assume that B E B(G) and that the error covariance matrix 
Var(e) = Q, is in O(G), then (|2.1|) has a unique solution Y that is a multivariate 
normal random vector with covariance matrix 



S = Var(y) = $ G (B, 0) := (I - B)' 1 ^! - B) 



-f 



(2.4) 



Here, / is the identity matrix and the superscript '— t' stands for transposition and 
inversion. 

The above considerations lead to the following definition of a linear model asso- 
ciated with a path diagram (or equivalently, a system of structural equations) . 

Definition 2.1. The normal linear model N(G) associated with a path diagram G 
is the family of multivariate normal distributions Af(0, S) with covariance matrix in 
the set P(G) = {(J - B)- 1 ^ - B)~* \ B £ B(G), G O(G)}. W^e co// f/ie map 
$q : B(G) x O(G) — > P(G) defined in \2.1$ the parameterization map o/N(G). 

Example 2.2. The path diagram G in Figure [2|a) depicts the equation system 



Y 3 = P 31 Y 1 +P 32 Y2 + £3, 



Y 2 = i3 21 Y 1 + (3 24 Y 4 + e 2 , 
Y 4 = (3 43 Y 3 + e 4 , 



where ei,£2,£3,ff4 are pairwise uncorrelated, that is, the matrices ft G O(G) are 
diagonal. This system exhibits a circular covariate-response structure as the path 
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diagram contains the directed cycle 2 — » 3 — > 4 — > 2. This feedback loop is reflected 
in the fact that det(J — B) = 1 — /?24/?43/332 for i? G B(G). Therefore, the path 
coefficients need to satisfy that /?24/543/?32 ^ 1 in order to lead to a positive definite 



covari ance matrix in P(G). This example is considered in more detail in iDrton 



(2008), where it is shown that the parameter space P(G) has singularities that lead 



to non-standard behavior of likelihood ratio tests. 

The models considered in the remainder of this paper will not have any feedback 
loops, i.e., they have the following structure. 

Definition 2.3. A path diagram G and its associated normal linear model N(G) are 
recursive or acyclic if G does not contain directed cycles, that is, there do not exist 
i, i\, . . . ,ik € V such that G features the edges i — > i\ — > 12 —>•••—> if. — > i. 

We will use the term acyclic rather than recursive, as some authors have used 
the term 'recursive' for path diagrams that are acyclic and cont ain no bi-directed 



edges . Acyclic path diagrams coincide with the summary graphs of lCox and Wermuth 



(1996). If G is acyclic, then the vertices in V can be ordered such that a matrix B 



that satisfies (|2.2|) is lower-triangular. It follows that 

det(I-B) = l. (2.5) 

In particular, I — B is invertible for any choice of the path coefficients (3ij, j — ► i in 
G, and the parameterization map <&g is a polynomial map. 



2.2 Bow-free Acyclic Path Diagrams (BAPs) 

The normal linear model N(G) associated with a path diagram G is everywhere 
identifiable if the parameterization map $g is one-to-one, i.e., for all Bq € B(G) and 
U G 0(G) it holds that 

$ G (B,n) = $ G (Bo,n ) =^ B = B &ndn = n . (2.6) 

If there exists a Lebesgue null set N G Q B(G) x 0(G) such that (|2.6|) holds for all 
(Bq,Qq) N G , then we say that N(G) is almost everywhere identifiable. 

Acyclic path diagrams may contain bows, that is, double edges it^j. It is easy 
to see that normal linear models associated with path diagrams with bows are never 
everywhere identifiable. However, they may sometimes be almost everywhere iden- 
tifiable as is the case for the next example. This example illustrates that almost 
everywhere identifiability is not enough to ensure regular behaviour of statistical 
procedures. 

Example 2.4. The path diagram G in Figure [2jb) features the bow 3 1£ 4. The 
associated normal linear model N(G) is also known as an instrumental variable 
model. The 9-dimensional parameter space P(G) is part of the hypersurface defined 
by the vanishing of the so-called tetrad <7i3<724 — <ti4<T23- It follows that the model 
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N(G) lacks regularity because the tetrad hypersurface has singularities at points 
£ 6 P(G) with o"i3 = 0"i4 = (J23 = o"24 = 0. These singularities occur if and only 
if /?3i = /?32 = 0, and correspond to points at which the identifiability property in 
(|2.6|) fails to hold. This lack of smoothness expresses itself statistically, for example, 
w hen tes t ing th e hypothesis P31 = (3%2 = in model N(G). Using the techniques 



m 



Drtonl (12008). the likelihood ratio statistic for this problem can be shown to have 
non-standard behavior with a large-sample limiting distribution that is given by the 
larger of the two eigenvalues of a 2 x 2-Wishart matrix with 2 degrees of freedom 
and scale parameter the identity matrix. 

Definition 2.5. A path diagram G and its associated normal linear model N(G) are 
bow-free if G contains at most one edge between any pair of vertices. If G is bow- free 
and acyclic, we call it a bow-free acyclic path diagram (BAP). 

As stressed in the introduction, BAPs are widespread in applications. Examples 
are shown in Figures [H E|c) and [6J Contrary to some path diagrams with bows, 
the normal linear models assciated with BAPs are always at least almost everywhere 
identifiable. 



Theorem 2.6 ( Brito and Pearll ( 2002 )). If G is a BAP, then the normal linear model 



N(G) is almost everywhere identifiable. 

Many BAP models are in fact everywhere identifiable. 



Theorem 2.7 ([Richardson and Spirted (120021 )). Suppose G is an ancestral BAP, 



that is, G does not contain an edge i «-> j such that there is a directed path j — > i\ — > 
■ ■ ■ — > if, — > i that leads from vertex j to vertex i. Then the normal linear model 
N(G) is everywhere identifiable. 

The next example shows that the condition in Theorem 12.71 is sufficient but 
not necessary for identification. The characterization of the class of BAPs whose 
associated normal linear models are everywhere identifiable remains an open problem. 

Example 2.8. The BAP G in Figure (2)^c) is not ancestral because it contains the 
edges 4 <-> 2 — > 3 — > 4. Nevertheless, the associated normal linear model N(G) is 
everywhere identifiable, which can be shown by identifying the parameters in B and 
f2 row-by-row following the order 1 < 2 < 3 < 4. It is noteworthy that the model 
N(G) in this example is not a Markov model, that is, a generic multivariate normal 
distribution in N(G) will exhibit no conditional independence relations. Instead, the 
entries of covariance matrices £ = (cr^-) £ P(G) satisfy that 

(0-11(722 - 0"l2)( cr 14<733 ~ C13O-34) = (<7i 3 (T24 - 0"i 4 <723) (c r 12<7l3 ~ CllO^)- (2.7) 

The constraint in ()2.7[) has a nice interpretation. Let (Yx, . . . ,Y±) have (positive 
definite) covariance matrix E = (o^), and define e2 = I2 — o~2i/o~uY± to be the 
residual in the regression of I2 on Y\ . Then (|2.7p holds for S if and only if Y\ and 
I4 are conditionally independent given ei and Y3. 
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Figure 3: Bow- free acyclic path diagram whose associated normal linear model is 
almost, but not everywhere, identifiable. The model is not a curved exponential 
family. 



An inspection of the proof of Theorem l2.6l given in lBrito and Pearll (|2002l ) reveals 
the following fact. 

Lemma 2.9. If the normal linear model N(G) associated with a BAP G is every- 
where identifiable, then the (bijective) parameterization map &g has an inverse that 
is a rational map with no pole on P(G). 



By (|2.5p . the parameterization map &g f° r a BAP G is polynomial and thus 
smooth. If is ration al and without p ole, then the image of <&g, i- e -i P(G) is a 
smooth manifold (see e.g. lEdwardsl . Il994l . II. 4). This has an important consequence. 



Corollary 2.10. // the normal linear model N(G) associated with a BAP G is 
everywhere identifiable, then N(G) is a curved exponential family. 



The theory of curved exponential families is discussed in iKass and Vosl (jl997l ). 
It implies in particular that maximum likelihood estimators in curved exponential 
families are asymptotically normal, and that likelihood ratio statistics comparing two 
such families are asymptotically chi-square regardless of where in the null hypothesis 
a true parameter is located. Unfortunately, however, Lemma 12.91 and Corollary 12.101 
do not hold for every BAP. 

Example 2.11. The normal linear model associated with the BAP G in Figure [3] is 
not a curved exponential family. In this model the identifiability property in ([2.6p 
breaks down if and only if (B, Q) satisfy that 

/321W14W24 - + 0J 2 4 = 0, /?32/?43^2 + = 0. 

It can be shown that the covariance matrices <3?g(-B,0) associated with this set of 
parameters yield points at which the 13-dimensional set P(G) has more than 13 
linearly independent tangent directions. Hence, P(G) is singular at these covariance 
matrices. 



3 Likelihood Inference 

Suppose a sample indexed by a set N is drawn from a multivariate normal distri- 
bution N(0,E) in the linear model N(G) associated with a BAP G = (V,E). We 
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group the observed random vectors as columns in the V x N matrix Y such that Yi m 
represents the observation of variable i on subject m. Having assumed a zero mean 
vector, we define the empirical covariance matrix as 

Assuming N > V, the matrix S is positive definite with probability one. Here, the 
symbols N and V are used to also denote set cardinalities. Models with unknown 
mean vector \i € can be treated by estimating fx by the empirical mean vector 
and adjusting the empirical covariance matrix accordingly; N > V + 1 then ensures 
almost sure positive definiteness of the empirical covariance matrix. 

3.1 Likelihood Function and Likelihood Equations 

Given observations Y with empirical covariance matrix S, the log-likelihood function 
£ : B(G) x O(G) -> K of the model N(G) takes the form 

£(B,U) = -y logdet(n) - —tr[(I-B) t n- 1 (I-B)S]. (3.2) 

Here we ignored an additive constant and used that det(I — B) = 1 if B G B(G); 
compare (j'2.5|) . Let /? = | i E V, j € pa(i)) and w = (wy | « < j, j G sp(i) or z = 
j) be the vectors of unconstrained elements in B and Q. Let P and Q be the matrices 
with entries in {0, 1} that satisfy vec(-B) = P(5 and vec(f2) = Quj, respectively, where 
vec(A) refers to stacking of the columns of the matrix A. Taking the first derivatives 
of £(B, O) with respect to (5 and uj we obtain the likelihood equations. 

Proposition 3.1. The likelihood equations of the normal linear model N(G) asso- 
ciated with a BAP G can be written as 

P t wec(n- 1 (I-B)S) =P'vec(lT 1 S) - P*(5 ® -1 )P/3 = 0, (3.3) 
Q*vec (fT 1 - fl-^J - 5)5(7 - S^ir 1 ) = 0, (3.4) 

where ® denotes the Kronecker product. 

In general, the likelihood equations need to be solved iteratively. One possible 
approach proceeds by alternately solving (|3.3|) and (|3.4|) for /? and w, respectively. 
For fixed w, (|3.3p is a linear equation in [i and easily solved. For fixed j3, (|3.4p 
constitutes the likelihood equations of a multivariate normal covariance model for 
e = (I — B)Y, which is specified by requiring that Vtij = whenever the edge i <-> j 
is not in G. The solution of (|3.4p requires, in general, another iterative method. As 
an alternative to this nesting of two iterative methods, we propose in Section H] a 
method that solves (|3 .3|) and (|3.4p in joint updates of /? and w. 
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3.2 Fisher-Information 



Large-sample confidence intervals for (f3, uS) can be obtained by approximating the 
distribution of the MLE (/?, u>) by the normal distribution with mean vector (/?, u) 
and covariance matrix /(/?, Here, I(/3,u>) denotes the Fisher-information, 

which, as shown in Appendix [A] is of the following form. 

Proposition 3.2. The (expected) Fisher-information of the normal linear model 
N(G) associated with a BAP G is 



F*(E®n -1 )P P t [(I-B)- 1 (8)0- 1 ]Q 



The Fisher-information in Proposition 13.21 need not be block-diagonal, in which 
case the estimation of the covariances uj affects the asymptotic variance of the MLE 
p. However, this does not ha ppen for bi- directed c hain graphs, which form one of 



the model classes discussed in IWermuth and Cox! (|2004l ) . A path diagram G is a 



bi-directed chain graph if its vertex set V can be partitioned into disjoint subsets 
7*1, . . . , 7t, known as chain components, such that all edges in each subgraph G Tt are 
bi-directed and edges between two subsets t s ^ r t are directed, pointing from t s to 
Tt, if s < t. Since bi-directed chain graphs are ancestral graphs the associated normal 
linear models are everywhere identifiable. 

Proposition 3.3. For a BAP G, the following two statements are equivalent: 

(i) For all underlying covariance matrices E G P(G), the MLEs of the parameter 
vectors (3 and u of the normal linear model N(G) are asymptotically indepen- 
dent. 

(ii) The path diagram G is a bi-directed chain graph. 

A proof of Proposition l3.3l is given in Appendix[AJ This result is an instance of the 
asymptotic independen ce of mean and natural pa rameters in mixed parameterization 
of exponential families ( Barndorff-Nielsen . 19781 ). 



4 Residual Iterative Conditional Fitting 

We now present an algorithm for computing the MLE in the normal linear model 
N(G) associated with a BAP. The algori t hm ex tends the iterative conditional fit- 



ting (ICF) procedure of IChaudhuri et al.l (|20071 ). which is for path diagrams with 



exclusively bi-directed edges. 

Let Yi € l w denote the i-ih row of the observation matrix Y and Y—% = Yyyuy 
the (V \ {i}) x N submatrix of Y. The ICF algorithm proceeds by repeatedly 
iterating through all vertices i £ V and carrying out three steps: (i) fix the marginal 
distribution of YLj, (ii) fit the conditional distribution of Yi given YLj under the 
constraints implied by the model N(G), and (iii) obtain a new estimate of E by 
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combining the estimated conditional distribution {Yi | Y_$) with the fixed marginal 
distribution of Y-i . The crucial point is then that for path diagrams containing only 
bi-directed edges, the problem of fitting the conditional distribution for (Yi | 
under the constraints of the model can be rephrased as a least squares regression 
problem. Unfortunately, the consideration of the conditional distribution of (Yi \ YL-i) 
is complicated for path diagrams that contain also directed edges. However, as we 
show below, the directed edges can be 'removed' by consideration of residuals, which 
here refers to estimates of the error terms e = (I — B)Y . Since it is based on this 
idea, we give our new extended algorithm the name Residual Iterative Conditional 
Fitting (RICF). 



4.1 The RICF Algorithm 

The main building block of the new algorithm is the following decomposition of 
the log-likelihood function. We adopt the shorthand notation Xq for the C x N 
submatrix of a D x N matrix X, where CCD. 

Theorem 4.1. Let G be a BAP and i E V. Let \\x\\ 2 = x f x and define 

^ii.—i — ^2,— i^—i (4*1) 

to be the conditional variance of Si given e_j; recall that = (fi_ »_ Then 

the log-likelihood function £(B, Q) of the model N(G) can be decomposed as 

£(B,U) = - — logWjj.-j - \\Yi - fij (Pa (i)l^a(i) - fi iiS p(j)(^I 4 1 _ i e-i) sp (j)|| 2 

Proof. Forming e = (I — B)Y, we rewrite (13.2p as 

N . , 1 



£(B, n) = -— logdet(n) - -tr(fi~W) =: £(Q \ e). (4.2 
Using the inverse variance lemma ( Whittaker . 199ol . Prop. 5.7.3), we partition 



as 



-1 „ II 2 



We obtain that the log-likelihood function in (|4.2|) equals 

| e) = - — log^ii.-j - \\£i - Oj -iO_j_j£- M | 

- j logdet(fi_ i ,_ i ) - ^(nzl^e-ieU). (4.3) 
By definition, £j = Y^ — B ipa ^Y pa ^y Moreover, under the restrictions (|2.3[) . 

= ^i,sp(i) (^-jj—i £ -i)sp(i) : 

which yields the claimed decomposition. □ 



12 



The log-likelihood decomposition (|4.2p is essentially based on the decomposition 
of the joint distribution of e into the marginal distribution of e_j and the conditional 
distribution (£j | £_j). This leads to the idea of building an iterative algorithm 
whose steps are based on fixing the marginal distribution of e_j and estimating a 
conditional distribution. In order to fix the marginal distribution e_j we need to fix 

the submatrix S~i comprising all but the i-th row and column of f2 as well as the 

submatrix B_iy, which comprises all but the i-th row of B. With O-^-j and B_iy 
fixed we can compute e_j as well as the pseudo-variables, defined by 

Z_ i = n: i 1 > _ i e_ i . (4.4) 

From (|4.2p . it now becomes apparent that, for fixed fi-i -j and B^y, the max- 
imization of the log-likelihood function £(B,£l) can be solved by maximizing the 
function 

((Ai)jepa(i)) ( w ifc)fcGsp(i) ) Wii.—i) h- > 

AT 1 II x - x 2 

-— logafo.-j- — ||Y t - 2^ Pij Y j ~ ^ikZkW (4.5) 

j'epa(i) fcGsp(i) 

over RP a W x M s pW x (0, oo). The maximizers of (|4.5p . however, are the least squares 
estimates in the regression of Yi on both (Yj \ j G pa(i)) and (Zk | k G sp(i)). 

Employing the above observations, the RICF algorithm for computing the MLE 
(B,£l) repeats the following steps for each i G V: 

1. Fix Q-i-i and B-iy, and compute residuals e_j and pseudo- variables Z S pU\; 

2. Obtain least squares estimates of fy, j G pa(i), Wjfc, G sp(i), and w^.-j by 
regressing response variable Yi on the covariates Yj, j G pa(i) and Zk, k G sp(i); 

3. Compute an estimate of ton = Ua-i + fli^-iUz] _i^l-i t i using the new estimates 
and the fixed parameters; compare (|4.1|) . 

After steps [1] to [3l we move on to the next vertex in V. After the last vertex in V 
we return to consider the first vertex. The procedure is continued until convergence. 

Example 4.2. For illustration of the regressions performed in RICF, we consider 
the normal linear model associated with the BAP G in Figure E^c). The parameters 
to be estimated in this model are fai, 031, P32, P43 and wu, UJ22, ^33, W44, W24- 

Vertex 1 in Figure [2]^c) has no parents or spouses, and its RICF update step 
consists of a trivial regression. In other words, the variance wu is the unconditional 
variance of Yi with MLE (b\\ = S\\. For the remaining vertices, the corresponding 
RICF update steps are illustrated in Figure 01 where the response variable Yi in the 
i-th update step is shown as a square node while the remaining variables are depicted 
as circles. Directed edges indicate variables acting as covariates in the least squares 
regression. These covariates are labelled according to whether the random variable 
Yj, or the pseudo- variable Zj defined in (|4.4p . is used in the regression. Note that 
since sp(3) = 0, repetition of steps HH3] in ^4.11 is required only for i G {2, 4}. 
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Figure 4: Illustration of the RICF update steps in Example l4.2[ The structure of each 
least squares regression is indicated by directed edges pointing from the predictor 
variables to the response variable depicted by a square node. (See text for details.) 



In RICF, the log-likelihood function l(B,£l) from (|3.2p is repeatedly maximized 
over sections in the parameter space defined by fixing the parameters Q-i-i, and 
B-iy. RICF thus is an iterative partial maximization algorithm and has the follow- 
ing convergence properties. 

Theorem 4.3. If G is a BAP and the empirical covariance matrix S is positive 
definite, then the following holds: 

(i) For any starting value (B°,Q°) £ B(G) x 0(G), RICF constructs a sequence 
of estimates (B s ,Cl s ) s in B(G) x 0(G) whose accumulation points are local 
maxima or saddle points of the log-likelihood function £(B,ft). Moreover, eval- 
uating the log-likelihood function at different accumulation points yields the 
same value. 

(ii) If the normal linear model N(G) is everywhere identifiable and the likelihood 
equations have only finitely many solutions then the sequence (B s ,£l s ) s con- 
verges to one of these solutions. 

Proof. Let ^(S) be the log-likelihood function for the model of all centered multi- 
variate normal distributions on W v . If S is positive definite then the set C that 
comprises all positive definite matrices £ G R VxV at which £(E) > £(B°, Cl°) is com- 
pact. In particular, the log-likelihood function in (|3.2|) is bounded, and claim (i) 
can be derived from genera l result s about iterative partial maximization algorithms; 
see e.g. iDrton and Eichlerl ([2006). For claim (ii) note that if N(G) is everywhere 
identifiable, then the compact set C has compact preimage cj)^}{C) under the model 
parameterization map; recall Lemma 12.91 □ 

Remark 4.4. If the normal linear model N(G) associated with a BAP G is not 
everywhere identifiable, then it is possible that a sequence of estimates (B s ,Ct s ) s 
produced by RICF diverges and does not have any accumulation points. In these 
cases, however, the corresponding sequence of covariance matrices &g(B S i Q s )s still 
has at least one accumulation point because it ranges in the compact set C exhibited 
in the proof of Theorem 14.31 Divergence of (B s ,Cl s ) s occurs in two instances in 
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the simulations in ^ compare Table [TJ In both cases, the sequence &g{B s , Cl s ) 
converges to a positive definite covariance matrix. 



4.2 Computational Savings in RICF 

If G is a DAG, i.e., an acyclic path diagram without bi-directed edges , then the 



MLE (B, O) in N(G) can be found in a finite number of regressions (e.g. IWermuthl . 



1980). However, we can also run RICF. Since in a DAG, sp(i) = for all i € V, 
step [2] of RICF regresses variable Yj solely on its parents Yj, j € pa(i). Not involving 
pseudo-variables that could change from one iteration to the other, this regression 
remains the same throughout different iterations, and RICF converges in one step. 

Similarly, for a general BAP G, if vertex i G V has no spouses, sp(i) = 0, then the 
MLE of .Bj ,pa(i) an d oju can be determined by a single iteration of the algorithm. In 
other words, RICF reveals these parameters as being estimable in closed form, namely 
as rational functions of the data. (This occurred for vertex i = 3 in Example 14.21 ) 
It follows that, to estimate the remaining parameters, the iterations need only be 
continued over vertices j with sp(j) ^ 0. 

For further computational savings note that ^dis(«),y\(dis(i)u{i}) = 0, where dis(i) = 
{j I j " " " ^ hj *'}■ Hence, since sp(i) C dis(i), 

(^-t -i £ -i)sp(i) = (^dis(i),dis(i) £dis ( l )) s PW' 



see iKosterl (jl999i . Lemma 3.1.6) and iRichardson and Spirtesl (|2002l . Lemma 8.10). 
Since £ dis(i) = ld is(i) - J B dis(i)ipa(dis(i)) y pa(dis(i)) , it follows that in the RICF update step 
for vertex i attention can be restricted to the variables in {i}Upa(i)Udis(i)Upa(dis(i)). 

Finally, note that while the RICF algorithm is described in terms of the entire 
data matrix Y, the least squares estimates computed in its iterations are clearly 
functions of the empirical covariance matrix, which is a sufficient statistic. 



5 Simulation Studies 

In order to evaluate the performance of the RICF algorithm we consider two scenar- 
ios. First, we fit linear models based on randomly generated BAPs to gene expression 
data. This scenario is relevant for model selection tasks, and we compare RICF's 
performance in this problem to that of algorithms implemented in software for struc- 
tural equation modelling. Second, we study how RICF behaves when it is used to fit 
larger models to data simulated from the respective model. In contrast to the first 
scenario, the second scenario involves models that generally fit the considered data 
well. 



5.1 Gene Expression Data 

We consider data on gene expression in Arabidopsis thaliana from lwille et al. ( 2004 ). 
We restrict attention to 13 genes that belong to one pathway: DXPS1-3, DXR, MCT, 
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Table 1: Fitting simulated BAPs to gene expression data using RICF, LISREL and 
'sem'. Each row is based on 1000 simulations. Running time is average CPU time 
(in sec.) for the cases in which all three algorithms converged. (See text for details.) 







No convey 


$ence 


All 


All 


Running time 


d 


b 


RICF 


LIS 


SEM 


converge 


agree 


RICF 


LIS 


SEM 


0.05 


0.05 





36 


47 


941 


940 


0.03 


0.02 


1.15 




0.1 





177 


221 


746 


739 


0.09 


0.03 


1.58 




0.2 





499 


599 


347 


333 


0.21 


0.04 


2.71 


0.1 


0.05 





32 


36 


951 


949 


0.04 


0.03 


1.58 




0.1 





137 


193 


786 


780 


0.09 


0.03 


2.09 




0.2 





440 


610 


364 


354 


0.25 


0.04 


3.43 


0.2 


0.05 





19 


39 


958 


954 


0.05 


0.03 


2.67 




0.1 





91 


176 


815 


808 


0.13 


0.04 


3.34 




0.2 


1 


326 


520 


461 


452 


0.33 


0.05 


5.03 


0.3 


0.05 





16 


38 


960 


957 


0.06 


0.04 


4.04 




0.1 





59 


136 


859 


850 


0.17 


0.04 


4.96 




0.2 


1 


225 


471 


519 


490 


0.40 


0.06 


6.97 



CMK, MECPS, HDS, HDR, IPPI1, GPPS, PPDS1-2. Data from n = 118 microarray 
experiments are available. We fit randomly generated BAP models to these data 
using RICF and two alternative methods. 

The BAP models are generated as follows. For each of the 78 possible pairs of 
vertices i < j in V = {1, . . . , 13} we draw from a multinomial distribution to generate 
a possible edge. The probability for drawing the edge i — > j is d, and the probability 
for drawing i «-► j is b so that with probability 1 — d — b there is no edge between i and 
j. We then apply a random permutation to the vertices to obtain the final BAP. For 
each of twelve combinations (d,b) with d = 0.05,0.1,0.2,0.3 and b = 0.05,0.1,0.2, 
we simulate 1000 BAPs. The expected number of edges thus varies between 7.8 and 
39. 

For fitting the simulated BAPs to the gene expr ession data, we implemented 
RICF in the statistical programming environment R (IR Development Core Team! 



20071 ). As alternatives, we con sider the R package 'sem' (IFoxi . 120061 ) and the very 



widely used software LISREL (jjoreskog and Sorboml . 119971 ) in its student version 
8.7 for Linux (student versions are free but limited to 15 variables). Both these 
latter programs employ general purpose optimizers, e.g., 'sem' makes a call to the R 
function 'nlm'. 

Our simulation results are summarized in Table [TJ Each row in the table cor- 
responds to a choice of the edge probabilities d and b. The first three columns 
count how often, in 1000 simulations, the three considered fitting routines failed to 
converge. The starting values of LISREL and 'sem' were set according to program 
defaults, and RICF was started by setting and f^°) equal to the MLE in the 
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DAG model associated with the DAG obtained by removing all bi-directed edges 
from the considered BAP. 

LISREL and 'sem' failed to converge for a rather large number of models. The 
LISREL output explained why convergence failed, and virtually all failures were due 
to the optimizer converging to matrices that were not positive definite. The remedy 
would be to try new starting values but doing this successfully in an automated 
fashion is a challenging problem in itself. For RICF convergence failure arose in 
only two cases. In both cases the RICF estimates (B, O) had some diverging entries. 
Despite the divergence in (B, f2)-space, the sequence of associated covariance ma- 
trices &g(B,Q) computed by RICF converged, albeit very slowly. Recall that this 
phenomenon is possible in models that are almost, but not everywhere, identifiable 
(Remark I4.4p . In these examples LISREL produced similarly divergent sequences 
with approximately the same likelihood, and 'sem' reported convergence in one case 
but gave an estimate whose likelihood was nearly 40 points smaller than the inter- 
mediate estimates computed by LISREL and RICF. 

The columns labelled 'All converge' and 'All agree' in Table Q] show how often all 
methods converged, and when this occurred, how often the three computed maxima 
of the log-likelihood function were the same up to one decimal place. Since all meth- 
ods are for local maximization, substantial disagreements in the computed maxima 
can occur if the likelihood function is multimodal. 

Finally, the last three columns give average CPU time use for the cases in which 
all three algorithms converge. These are quoted to show that RICF is competitive 
in terms of computational efficiency, but for the following reasons the precise times 
should not be used for a formal comparison. On the one hand, LISREL is fast because 
it is compiled code. This is not the case for the R-based 'sem' and RICF. On the 
other hand, the fitting routines in LISREL and 'sem' not only compute the MLE but 
also produce various other derived quantities of interest. This is in contrast to our 
RICF routine, which only computes the MLE. 

5.2 Simulated Data 

In order to demonstrate how RICF behaves when fitting larger models we use the al- 
gorithm on simulated data. We consider different choices for the number of variables 
p and generate random BAPs according to the procedure used in £15.11 We limit our- 
selves to two different settings for the expected number of edges, choosing d = 0.1 or 
d = 0.2 and setting b = d/2 in each case. For each BAP G, we simulate a covariance 
matrix £ = (I - 5)^(7 - B)'* € P(G) as follows. The free entries in B G B(G) 
and the free off-diagonal entries in Q G O(G) are drawn from a Af(0, 1) distribution. 
The diagonal entries con are obtained by adding a draw from a xf-distribution to the 
sum of the absolute values of the off-diagonal entries in the i-th row of 0. This makes 
diagonally dominant and thus positive definite. Finally, we draw a sample of size 
n from the resulting multivariate normal distribution N(G). For each distribution 
two cases, namely n = 3p/2 and n = Wp, are considered to illustrate sample size 



17 



100 



10 



0.1 



0.01 



d = 0.1, n = 3p/2 



° 8 a I 

8 ' B I 

° : S ; 

1 B : 



8 



i i i r~ 

10 20 30 40 50 



d = 0.1, n = 10p 



• : a 



I 



| H 



i i i r~ 
10 20 30 40 50 



d = 0.2, n = 3p/2 

o 



° a i T 

1 8 $ T 

9 ! B - 

i 9 : 



~~ i i i i r~ 
10 20 30 40 50 



d = 0.2, n = 10p 



§ i 



-r g 6 



B 



~~ i i i i r~ 
10 20 30 40 50 



p (number of variables) 



Figure 5: Boxplots of CPU times (in sec. on log 10 -scale) used by RICF when fitting 
BAP models to simulated data. Each boxplot summarizes 500 simulations. The 
number of variables is denoted by p, the sample size is n, and the parameter d 
determines the expected number of edges of the simulated BAPs (see text for details). 



effects. For each combination of p, d and n, we simulate 500 BAPs and associated 
data sets. 

Figure [5] summarizes the results of our simulations in boxplots. As could be 
expected, the running time for RICF increases with the number of variables p and 
the expected number of edges in the BAP (determined by d). Moreover, the running 
time decreases for increased sample size n, which is plausible because the empirical 
covariance matrix of a larger sample will tend to be closer to the underlying parameter 
space P(G). The boxplots show that even with p = 50 variables the majority of 
the RICF computations terminate within a few seconds. However, there are also a 
number of computations in which the running time is considerably longer, though 
still under two minutes. This occurs in particular for the denser case with smaller 
sample size (d = 0.2 and n = 3p/2). 



6 Discussion 



As mentioned in the introduction, normal linear models associated with path dia- 
grams are employed in many applied disciplines. The models, also known as struc- 
tural equation models, have a long tradition but remain of current interest i n par 



ticular due to the m o re rec ent developments in causal inference; compare e.g. iPearl 



(2000); ISpirtes et al.l (|2000l ). Despite their long tradition, however, many mathemat- 
ical, statistical and computational problems about these models remain open. 

The new contribution of this paper is the Residual Iterative Conditional Fit- 
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Figure 6: Path diagram for seemingly unrelated regressions. 



ting (RICF) algorithm for maximum likelihood estimation in BAP models. Software 
for computation of MLEs in structural equation models often employs optimiza- 
tion methods that are not designed to deal with positive definiteness constraints on 
covariance matrices. This can be seen in particular in Table [J which shows that 
two a vailable pro grams, LISREL (jjoreskog and Sorboml . 119971 ) and the R package 
'sem' (jFoxl . 120061 ). fail to converge in a rather large num ber of problem s. This is 
in line with previous experience by other authors; see e.g. ISteigerl (J200l|). Our new 
RICF algorithm, on the other hand, does not suffer from these problems. It has 
clear convergence properties (Theorem I4.3|) and can handle problems with several 
tens of variables (see Figure [5]). In addition, RICF has the desirable feature that it 
estimates parameters in closed form (in a single cycle of its iterations) if this is pos- 
sible. If applied to a model based on a directed acyclic graph (DAG), the algorithm 
converges in a single cycle and performs exactly the regressions commonly used for 
fitting multivariate normal DAG models. This feature and the fact that RICF can 
be implemented using nothing but least squares computations make it an attractive 
alternative to less specialized optimization methods. 

In another spec ial cas e , nam ely seemingly unrelated regressions, RICF reduces 
to the algorithm of iTelserl (|1964l ). A path diagram representing seemingly unrelated 
regressions is shown in Figure [6l The variables Y\ , Yz and I3 are then commonly 
thought of as covariates. Since they have no spouses, the MLEs of the variances wy, 
CJ22 and W33 are equal to the empirical variances sn, S22 and S33. For the remaining 
variables Y$, i = 4, 5, RICF performs regressions on both the "covariates" ^ a (i) arm 
the residual £j, j G {4, 5} \ {i}. These are precisely the steps performed by Telser. 

Existing structural equation modelling software also fits models with latent vari- 
ables, whereas the RICF algorithm applies only to BAP models without latent vari- 
ables. Howeve r, RICF could be used to implement the M-step in the EM algorithm 
(|Kiiveril . [1987]) in order to fit latent variable models. This EM-RICF approach would 
yield an algorithm with theoretical convergence properties. 

Finally, we emphasize that the RICF algorithm is determined by the path di- 
agram. However, different path diagrams may induce the same statistical model; 
recall point §j§ in H1.2\ in the introduction. This model equivalence of path diagrams 
may be exploited to find a diagram for which the running time of RICF is short. Rel- 
evant grap hical const r uctio ns for this problem are described in lDrton and Richardson 
d2008h and lAli et all l|2005l ). 
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A Proofs 

Proof of Proposition \3.S[ Let (3 and oj be the vectors of unconstrained elements in 
B and O, respectively. The second derivatives of the log-likelihood function with 
respect to (3 and uj are: 

N ■ P^Sf&ir^P, (A.l) 
N ■ P l [S{I - BfVL- 1 ®£l- l ]Q, (A.2) 

— q 1 { [n -1 ® n- l {i - b)s{i - Bfn- 1 ] (a.3) 
+ - b)s(i - B^n- 1 ® ir 1 ] }q. 

Replacing S by E[5] = (J — J B)- 1 ^(I - J3) - * in ([ATT]) - <|X2|> yields the claim. □ 

Proof of Proposition 13.31 If G is a bi-directed chain graph, then the submatrix B Ttn = 
for all t, while for s ^ t we have £l Ta ,T t = 0. In this case the second derivative of the 
log-likelihood function with respect to j3ij and uj^i is equal to d 2 £(B,Q)/d/3ij duj^i = 
[(I — B) _1 ]ji (f2 -1 )j£. Now [(/ — B)~ 1 ]ji may only be non-zero if j = I or / is an 
ancestor of j, that is, if there exists a directed path I — ¥ ji — ► • • • — > j m — > j in G. On 
the other hand, (r^" 1 )^ = whenever i and k are not in the same chain component. 
Therefore, the second derivative in (|A.2|) is equal to zero. 

Conversely, it follows that the second derivative in (|A.2[) vanishes for all param- 
eters only if the graph belongs to the class of bi-directed chain graphs. □ 



d 2 £(B,n) 

dp dp 
d 2 £(B,n) 

d 2 £(B,n) 
dujduj 1 
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